Addax distribution

Addax (Addax nasomaculatus) are considered one of the rarest antelopes on earth, with an estimated population of <100 individuals in the wild. We assessed the distribution and occurrence of this species from field surveys collected by the Sahara Conservation Fund over a 7-year period (2008-2014). Our results provide insight into the factors contributing to species occurrence and are guiding field surveys to areas that have the potential to support small and geographically isolated populations of addax. We incorporated field-derived variables of vegetation cover with remote sensing measures of vegetation productivity (NDVI - Normalized Difference Vegetation Index) and surface roughness (derived from 30-m SRTM). Models were fit in a generalized linear regression framework to predict and evaluate species occurrence.

Addax moving across shifting sands in the Tin Toumma desert, Niger (Photo: T.Rabeil, SCF)

Addax moving across shifting sands in the Tin Toumma desert, Niger (Photo: T.Rabeil, SCF)

Lab Excercise

In this excercise, we will follow the steps detailed in:

Stabach, J.A., T. Rabeil, V. Turmine, T. Wacher, T. Mueller, and P. Leimgruber. 2017. On the brink of extinction - Habitat selection of addax and dorcas gazelle across the Tin Toumma desert, Niger. Diversity and Distributions 23:581-591.

We will use R for all analyses to create a species distribution model, highlighting the variables important in predicting addax occurrence. Like other excercises in R, we will first load the necessary packages to perform the analyses. Use help() for any operations that you don’t understand or that require additional information.

Load Packages

#install.packages("packagename")
library(usdm)
library(arm)
library(visreg)
library(pROC)
library(DAAG)
library(fields)
library(MuMIn)
library(gstat)
library(sp)
library(lubridate)

Set Working Directory

Use getwd() to determine your current working directory. If necessary, use the setwd() command to change to a different location.

getwd()
#setwd("D:/whatever")

Addax Data File

The raster data layers (NDVI and surface roughness) have already been extracted at plot locations. This has been done to focus on modeling the statistical relationships in this exercise. Building the spatial database, however, is hugely important and should not be overlooked. The layers included in analyses will ultimately depend on your scientific research questions and what is biologically relevant for your species.

NDVI and surface roughness were summarized by calculating the mean value of all pixels within a 2.5-km radius of each plot location (a moving window analysis). Plot locations were spaced at 5-km intervals along line transects. Each line transect varied in length between 50-km and 100-km and were spaced ~10-km apart. Transects were repetitively sampled across years. Information on how to create and extract surface roughness is included as an addendum to this exercise.

All data (including animals counts) were summarized at plot locations. We re-coded animal counts within a 2.5-km radius to a measure of occurrence (i.e, presence/absence). Thus, we modeled the data as a series of 1’s and 0’s, representing addax occurrence at plot locations. Data were aggregated in this fashion because of variability between surveys (i.e., the transects locations didn’t overlap exactly) and because we did not have confidence in the accuracy of the number of individuals recorded at each sighting. In addition, distance to animal sighting locations were only recorded in a subset of the surveys. Sightings >500-m from the transect were removed due to an assumed undercounting bias (confirmed by investigating the frequency of sighthings in relation to distance). This allowed for a conservative broad-scale approach to incorporate extremely messy field data collected over multiple years. See more details in Stabach et al. 2017.

# Load prepared data file. NDVI and surface roughness already extracted.
load(file="./Data/Addax_Dataset.RData")
# Look at data
head(Sub.All2)
##   Pt_ID       Date YearMonth Id_1        X       Y Addax Dorcas Cornul
## 1     1 2008-12-06    200812   23 767911.0 1742830     0      0      0
## 2     2 2008-12-06    200812   23 770769.0 1744528     0      0      0
## 3     3 2008-12-06    200812   23 776095.8 1747469     0      5      0
## 4     4 2008-12-06    200812   23 780842.3 1750128     0      4      0
## 5     5 2008-12-06    200812   23 785517.3 1752312     0     16      0
## 6     6 2008-12-06    200812   23 790001.9 1754484     0      6      0
##   Stipa1 Stipa2 Human      TRI           TPI    ROUGH Index      ndvi
## 1      0      0     0 2.692203  1.680855e-03 8.343433     1 0.1223870
## 2      0      0     0 2.483226 -9.582118e-05 7.700197     1 0.1219408
## 3      1      0     0 1.617903 -5.089201e-04 5.044250     1 0.1201930
## 4      1      0     0 1.708651  2.159401e-03 5.331611     1 0.1287899
## 5      0      0     0 2.000507  4.656293e-03 6.293324     1 0.1247775
## 6      1      0     0 2.029911  4.920087e-04 6.445662     1 0.1234554

Columns in the database include Date of survey, X and Y location of each plot, the number of addax and dorcas gazelle (a conspecific) sighted at each location, a unique plot ID (Pt_ID), and the presence/absence of vegetation species Cornulaca monocantha (Cornul), Stipagrostis acutiflora (Stipa1), and Stipagrostis vulnerans (Stipa2). These vegetation species were thought a priori to influence addax occurence and were collected at each plot location. Human disturbance (e.g., footprint, sighting, tire tracks) were also recorded (i.e., Human = 1).

Surface roughness (Rough) is defined as the change in local elevation range (i.e., the difference between the minimum and maximum values of a cell and its eight surrounding neighbors). NDVI (Normalized Difference Vegetation Index) is known to be strongly correlated with a region’s vegetation productivity/greenness and has been used extensively as an important parameter in models predicting animal movement and habitat use. NDVI data (MOD13Q1) was downloaded as 16-day cloud-free data composites with a 250-m resolution.

Database Edits

Some additional steps were necessary to prepare the datafile for modeling purposes. First, we added the Season of survey (a factor, based on the data collection date) to investigate occurrence changes throughout the year. Then, we recoded the addax amd dorcas gazelle presence data to investigate species interactions. Finally, we added the year of survey to investigate longitudinal changes over the study period.

# Create a Month Field
Sub.All2$Month <- as.numeric(strftime(Sub.All2$Date,format="%m",tz="GMT"))
# Add in the Season
Sub.All2$Season <- ifelse(Sub.All2$Month >=3 & Sub.All2$Month <=6, "Dry",
                          ifelse(Sub.All2$Month >=7 & Sub.All2$Month <=10, "Wet",
                                 ifelse(Sub.All2$Month >=11 & Sub.All2$Month <=12, "Cold","Fix Problem")))
# Make a dataframe (remove from spatial format)
Sub.All2 <- as.data.frame(Sub.All2)

# Re-code the occurrence records as presence/absence 
Sub.All2$obsAddax <- ifelse(Sub.All2$Addax > 0, 1, 0) 
Sub.All2$obsDorc <- ifelse(Sub.All2$Dorcas > 0, 1, 0)

# Make appropriate fields factors (presence/absence)
cols <- c("Cornul", "Stipa1", "Stipa2", "Season", "obsAddax", "obsDorc") 
Sub.All2[cols] <- lapply(Sub.All2[cols], factor)

# Include survey year as a factor in models. Not included above because I am changing the column name. 
Sub.All2$Year <- as.factor(year(Sub.All2$Date))

Summarize Dataset

Execute a query to estimate the number of surveys conducted each year. Graph the data to visualize the patterns. How would you change the plotting function to visualize the extent of every survey?

# Aggregate
aggregate(as.character(Year) ~ Season, data = Sub.All2, unique)
##   Season                 as.character(Year)
## 1   Cold 2008, 2009, 2010, 2012, 2013, 2014
## 2    Dry             2009, 2010, 2011, 2014
## 3    Wet             2009, 2010, 2011, 2014
# Place data in a table to summarize results. 
Unique.ID <- unique(Sub.All2$YearMonth)
# Create matrix to hold everything
dat6 <- matrix(NA,nrow = length(Unique.ID),ncol = 9,dimnames = list(c(),c("YearMonth","Year","Season","PresAddax","PresDorc","PrevAdd","PrevDorc","One","Both")))

for (i in 1:length(Unique.ID)){
    temp <- subset(Sub.All2, YearMonth == Unique.ID[i])
        # Summarize dataset
        dat6[i,1] <- Unique.ID[i]
        dat6[i,2] <- as.character(unique(temp$Year))
        dat6[i,3] <- as.character(unique(temp$Season))
            obs.Add <- ifelse(temp$Addax > 0, 1, 0)
            obs.Dorc <- ifelse(temp$Dorcas > 0, 1, 0)
            Both <- obs.Add + obs.Dorc
            Both2 <- ifelse(Both > 1, 1, 0) # Vector where both present
            One <- ifelse(Both > 0, 1, 0) # At least one species present
        dat6[i,4] <- sum(obs.Add)
        dat6[i,5] <- sum(obs.Dorc)
        dat6[i,6] <- round(sum(obs.Add)/length(obs.Add)*100,digits=1)
        dat6[i,7] <- round(sum(obs.Dorc)/length(obs.Add)*100,digits=1)
        dat6[i,8] <- round(sum(One)/length(obs.Add)*100,digits=1)
        dat6[i,9] <- round(sum(Both2)/length(obs.Add)*100,digits=1)

        # Output each survey...plot only the first survey
        if(i==1){
        plot(temp$X,temp$Y,xlab="Easting",ylab="Northing",xlim=c(min(Sub.All2$X),max(Sub.All2$X)), ylim=c(min(Sub.All2$Y),max(Sub.All2$Y)), frame=FALSE, main=Unique.ID[i], asp=1)
}}

# Look at result.  This should match Table 1 in Stabach et al. 2017.
dat6 <- as.data.frame(dat6)
dat6
##    YearMonth Year Season PresAddax PresDorc PrevAdd PrevDorc  One Both
## 1     200812 2008   Cold        22       28    33.3     42.4 68.2  7.6
## 2     200906 2009    Dry         6       16     9.1     24.2 28.8  4.5
## 3     200909 2009    Wet         8       19    12.1     28.8 39.4  1.5
## 4     200912 2009   Cold        16       33    24.2       50 63.6 10.6
## 5     201004 2010    Dry         4       15     6.1     22.7 24.2  4.5
## 6     201006 2010    Dry         8       19    12.1     28.8 37.9    3
## 7     201009 2010    Wet         3       13     4.5     19.7 24.2    0
## 8     201012 2010   Cold        14       29    15.9       33 44.3  4.5
## 9     201103 2011    Dry         7        4    15.9      9.1 15.9  9.1
## 10    201107 2011    Wet        10       12    15.2     18.2 24.2  9.1
## 11    201212 2012   Cold         8        9    12.1     13.6 21.2  4.5
## 12    201311 2013   Cold        16       35    24.2       53 60.6 16.7
## 13    201403 2014    Dry         4        9     6.1     13.6 16.7    3
## 14    201409 2014    Wet         4        8     6.1     12.1 18.2    0
## 15    201412 2014   Cold         1       34     1.5     51.5 51.5  1.5
# Write to file if necessary
#write.csv(dat6,file="./Addax_Dorcas_Prevalence.csv", quote = FALSE, row.names = FALSE)

Scaling Parameter Values

It is often helpful and necessary to scale continuous parameters that have vastly different value ranges (e.g., elevation and NDVI). Doing so can help with model convergence. While conclusions will remain the same, it is important to remember that data are not on the same scale as the original values and must be back-transformed when making raster predictions. This is critical.

# Scale the parameters...Create a copy of the dataset
Sub.Scale <- Sub.All2

# Scale variables and append into original dataframe
# Check out the scale command to make sure you know what the scale function (with center = TRUE) is doing
Sub.Scale$shuman <- as.numeric(scale(Sub.Scale[,"Human"],center=TRUE))
Sub.Scale$sndvi <- as.numeric(scale(Sub.Scale[,"ndvi"],center=TRUE))
Sub.Scale$srough <- as.numeric(scale(Sub.Scale[,"ROUGH"],center=TRUE))
Sub.Scale$sDorcas <- as.numeric(scale(Sub.Scale[,"Dorcas"],center=TRUE))
Sub.Scale$sAddax <- as.numeric(scale(Sub.Scale[,"Addax"],center=TRUE))

# Create a blank matrix to hold the Mean and Standard Deviation of each continuous variable included the dataset. These values will be necessary to make the prediction map.
Scale.Val <- matrix(NA, 2, ncol(Sub.Scale[,c(15,17,8,12,7)]), dimnames=list(c("Mn","Sd"),c("Rough","NDVI","Dorcas","Human","Addax")))

# Calculate the mean and standard deviations.
Scale.Val[1,] <- apply(Sub.Scale[,c(15,17,8,12,7)],2,mean)
Scale.Val[2,] <- apply(Sub.Scale[,c(15,17,8,12,7)],2,sd)

# Look at values
Scale.Val
##       Rough        NDVI    Dorcas     Human    Addax
## Mn 6.618164 0.092855046  2.922222 0.7393939 1.476768
## Sd 1.464744 0.009629818 10.710220 1.8006750 8.336148
# Summarize the data.  Here, I'm just specifying the column numbers.
# How to determine which columns are being referenced? 
data.scale <- scale(Sub.Scale[,c(15,17,8,12,7)])
(summary.data <- apply(data.scale,2,summary))
##                 ROUGH          ndvi        Dorcas         Human
## Min.    -2.822142e+00 -2.702640e+00 -2.728443e-01 -4.106204e-01
## 1st Qu. -6.399369e-01 -6.891035e-01 -2.728443e-01 -4.106204e-01
## Median  -2.043973e-01 -8.768665e-02 -2.728443e-01 -4.106204e-01
## Mean     9.123348e-17  1.666452e-17  2.937262e-17  4.479617e-18
## 3rd Qu.  6.266297e-01  5.734426e-01 -1.794755e-01  1.447269e-01
## Max.     2.953631e+00  4.600067e+00  1.242531e+01  9.030284e+00
##                 Addax
## Min.    -1.771523e-01
## 1st Qu. -1.771523e-01
## Median  -1.771523e-01
## Mean     7.929132e-18
## 3rd Qu. -1.771523e-01
## Max.     1.973612e+01

Correlation Analysis

Evalute if we have data redundancy. Are there any continuous variables with high levels of correlation? What is a reasonable correlation threshold?

# Group variables together in a dataframe
data.all <- as.data.frame(cbind(Sub.Scale$srough,Sub.Scale$sndvi,Sub.Scale$sDorcas,Sub.Scale$shuman,Sub.Scale$sAddax))

# Variance Inflation Analysis
(eval.vif <- vifstep(data.all))
## No variable from the 5 input variables has collinearity problem. 
## 
## The linear correlation coefficients ranges between: 
## min correlation ( V5 ~ V3 ):  0.002850639 
## max correlation ( V2 ~ V1 ):  -0.3029674 
## 
## ---------- VIFs of the remained variables -------- 
##   Variables      VIF
## 1        V1 1.173900
## 2        V2 1.126290
## 3        V3 1.044492
## 4        V4 1.056243
## 5        V5 1.005890
# Or, use the cor function...will give essentially the same result in a different format
#cor(data.all)

Data Model

Generalized Linear Regression (GLM)

Model the occurrence of addax in a Generalized Linear Regression (GLM) framework. Our goal here is not necessarily to create the very best model. Instead, we aimed to identify the response of all variables at predicting addax occurrence. We then want to evaluate a submodel that contains only the remote sensing layers to make a prediction of habitat suitability across the landscape. We can easily assess how this ‘submodel’ compares with our ‘full’ model or the ‘best’ model. Why is using a GLM advantageous given the data and objectives?

# Create a full model with all the variables you think are important predictors of addax occurrence
glm.Addax <- glm(obsAddax ~ srough + I(srough^2) + sndvi + I(sndvi^2) + Human + obsDorc + Stipa1 + Stipa2 + Cornul + Season + Year, data = Sub.Scale, family = binomial(link="logit"))
# Summarize result and look at the confidence intervals
summary(glm.Addax)
## 
## Call:
## glm(formula = obsAddax ~ srough + I(srough^2) + sndvi + I(sndvi^2) + 
##     Human + obsDorc + Stipa1 + Stipa2 + Cornul + Season + Year, 
##     family = binomial(link = "logit"), data = Sub.Scale)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6707  -0.5032  -0.3257  -0.1610   3.1998  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.04781    0.43617   0.110 0.912715    
## srough       0.92886    0.18762   4.951 7.39e-07 ***
## I(srough^2) -0.67000    0.13669  -4.902 9.50e-07 ***
## sndvi       -0.14342    0.18245  -0.786 0.431830    
## I(sndvi^2)  -0.28571    0.10769  -2.653 0.007976 ** 
## Human       -0.26027    0.11153  -2.334 0.019610 *  
## obsDorc1     0.85616    0.25931   3.302 0.000961 ***
## Stipa11     -0.30643    0.25512  -1.201 0.229697    
## Stipa21      1.04595    0.26836   3.898 9.71e-05 ***
## Cornul1      0.56465    0.27663   2.041 0.041238 *  
## SeasonDry   -0.17105    0.37617  -0.455 0.649327    
## SeasonWet   -0.24657    0.34744  -0.710 0.477898    
## Year2009    -1.53933    0.51044  -3.016 0.002564 ** 
## Year2010    -2.14737    0.52032  -4.127 3.67e-05 ***
## Year2011    -2.14421    0.62789  -3.415 0.000638 ***
## Year2012    -2.67330    0.62883  -4.251 2.13e-05 ***
## Year2013    -1.34818    0.50000  -2.696 0.007011 ** 
## Year2014    -3.95420    0.59485  -6.647 2.98e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 773.74  on 989  degrees of freedom
## Residual deviance: 606.17  on 972  degrees of freedom
## AIC: 642.17
## 
## Number of Fisher Scoring iterations: 6
confint(glm.Addax)
## Waiting for profiling to be done...
##                   2.5 %      97.5 %
## (Intercept) -0.81328167  0.90264650
## srough       0.57355364  1.31099018
## I(srough^2) -0.95703191 -0.42014356
## sndvi       -0.50435089  0.21297658
## I(sndvi^2)  -0.51741444 -0.09355030
## Human       -0.50240964 -0.06338377
## obsDorc1     0.34919830  1.36746429
## Stipa11     -0.80881505  0.19330747
## Stipa21      0.52642598  1.58060456
## Cornul1      0.03193108  1.11909858
## SeasonDry   -0.90590388  0.57254276
## SeasonWet   -0.93412075  0.43247988
## Year2009    -2.55754270 -0.55183601
## Year2010    -3.18580305 -1.14165754
## Year2011    -3.40332215 -0.93720672
## Year2012    -3.94420059 -1.46871335
## Year2013    -2.34800464 -0.38182077
## Year2014    -5.16458354 -2.82441852
# Graph results
visreg(glm.Addax,scale="response",ylab="Prob",partial=TRUE,line=list(col="blue"),fill=list(col="gray"),ylim=c(0,1))

# Plot the coefficients
coefplot(glm.Addax, plot=TRUE, mar=c(1,4,5.1,2), intercept=FALSE, vertical=TRUE, main="", var.las=1, frame.plot=FALSE)

To plot the responses on the original scale, you need to back transform them. Here, I show how to do this for the parameter NDVI. From this code, could you do the same thing for the Surface Roughness variable? How would you edit/change the code?

# Graph the NDVI response
  # First, extract the Min and Max values from the summary.data above
  MinVal <- summary.data[1,2]
  MaxVal <- summary.data[6,2]

    # Then, set the sequence in which to plot the values 
    divs <- 100
    x <- seq(MinVal, MaxVal, length.out=divs)
    x.unscale <- x*Scale.Val[2,2]+Scale.Val[1,2]

# In the visreg function, you then just need to specify which variable you want to plot.
# You can then update the unscaled values into the plot, by specifying the axis with these values.
visreg(glm.Addax,"sndvi",scale="response",ylab="Probability of Occurrence",xlab="NDVI",partial=TRUE, axes=FALSE, rug=0, ylim=c(0,1),line=list(col="black",lwd=2,lty=1),fill=list(col="grey"),points=list(col="black",cex=0.25,pch=19),frame=FALSE,main="Addax")
axis(2,col="black",col.axis="black")
axis(1,at=c(x[1],x[50],x[100]),lab=c(round(x.unscale[1],digits=2),round(x.unscale[50],digits=2),round(x.unscale[100],digits=2)),col="black",col.axis="black")

Surface Roughness

How would you plot surface roughness instead?

You can easily edit the code described above to plot any of the other variables includes in the model. The key is to recognize that you need to change the columns that are being referenced in the summary.data table and the Scale.Val table to appropriately print the variables on the x-axis. Compare how the following code differs to plot the Surface Roughness.

  # Surface Roughness

  # Really, all you need to do is recognize that you need to reference a different column in the summary.data and Scale.Val dataframes.
  # Change the column values below
  MinVal <- summary.data[1,1] # Note the change in column number
  MaxVal <- summary.data[6,1] # Note the change in column number

    # Then, set the sequence in which to plot the values 
    divs <- 100
    x <- seq(MinVal, MaxVal, length.out=divs)
    x.unscale <- x*Scale.Val[2,1]+Scale.Val[1,1] # Note the change in column number

# Now, you simply need to change the variable that you want to plot ('srough') and the x label (xlab)
visreg(glm.Addax,"srough",scale="response",ylab="Probability of Occurrence",xlab="Surface Roughness",partial=TRUE, axes=FALSE, rug=0, ylim=c(0,1),line=list(col="black",lwd=2,lty=1),fill=list(col="grey"),points=list(col="black",cex=0.25,pch=19),frame=FALSE,main="Addax")
axis(2,col="black",col.axis="black")
axis(1,at=c(x[1],x[50],x[100]),lab=c(round(x.unscale[1],digits=2),round(x.unscale[50],digits=2),round(x.unscale[100],digits=2)),col="black",col.axis="black")

Validation

How good are the models? We can compare our full model to a null model, look at the Area Under the Curve (AUC) statistics, and perform a cross-validation analysis. AUC compares the difference between the True Positive Classification Rate and a False Positive Rate (i.e., Specificity vs Sensitivity). It was first developed in World War II to determine the accuracy of detections of aircraft in radar. We want the AUC curve to be as close to 1 as possible. Some guidelines:

  • 0.9 - 1: Excellent (A)
  • 0.8 - 0.9: Good (B)
  • 0.7 - 0.8: Fair (C)
  • 0.6 - 0.7: Poor (D)
  • 0.5 - 0.6: Fail (F)

Cross-validation is a technique to partition the original data into a training and a test dataset. The training set is then used to generate the models and the test dataset is used to evaluate it. Data are randomly divided between a number of ‘folds’. As each is removed, the remaining data are used to re-fit the model. The omitted observations are then used to predict at the omitted observations. If an ancillary dataset is available, it can also be used to test the model. This would be the optimal way to evaluate the model.

# What's the AUC?
predpr <- predict(glm.Addax, type=c("response"))
(roccurve <- roc(Sub.Scale$obsAddax ~ predpr))
## 
## Call:
## roc.formula(formula = Sub.Scale$obsAddax ~ predpr)
## 
## Data: predpr in 859 controls (Sub.Scale$obsAddax 0) < 131 cases (Sub.Scale$obsAddax 1).
## Area under the curve: 0.8221
plot(roccurve)

# What's the cross-validation statistic?
cv.binary(glm.Addax)
## 
## Fold:  6 9 1 8 4 7 5 2 3 10
## Internal estimate of accuracy = 0.879
## Cross-validation estimate of accuracy = 0.875

Raster Prediction

One of the most valuable parts of a species distribution model is predicting to locations where surveys were not performed. In order to make a prediction at these locations, we need data that has wall-to-wall coverage. Unfortunately, only two of our layers incorporated in the full model have full coverage (NDVI and Surface Roughness).

Create a model with these two layers, assessing how the model compares with the full model and predict across the entire study area. As you will see, model statistics indicate that this sub-model is not as good as the full model (compare the AIC, AUC, and cross-validation).

The model is still useful, however, as long as we are clear about its shortcomings (i.e., we’d expect the predictive power to be decreased since we are not including the fine scale data collected at individual plot locations).

Create Model Subset and Evaluate

glm.Addax2 <- glm(obsAddax ~ srough + I(srough^2) + sndvi + I(sndvi^2), data = Sub.Scale, family = binomial(link="logit"))

# Summarize and print confidence intervals
summary(glm.Addax2)
## 
## Call:
## glm(formula = obsAddax ~ srough + I(srough^2) + sndvi + I(sndvi^2), 
##     family = binomial(link = "logit"), data = Sub.Scale)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7518  -0.6128  -0.4999  -0.3055   2.9167  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.31025    0.13616  -9.623  < 2e-16 ***
## srough       0.63602    0.15559   4.088 4.35e-05 ***
## I(srough^2) -0.55024    0.12642  -4.352 1.35e-05 ***
## sndvi        0.08772    0.12001   0.731   0.4648    
## I(sndvi^2)  -0.21425    0.08969  -2.389   0.0169 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 773.74  on 989  degrees of freedom
## Residual deviance: 734.51  on 985  degrees of freedom
## AIC: 744.51
## 
## Number of Fisher Scoring iterations: 6
confint(glm.Addax2)
## Waiting for profiling to be done...
##                  2.5 %      97.5 %
## (Intercept) -1.5806638 -1.04656011
## srough       0.3415535  0.95321239
## I(srough^2) -0.8162905 -0.32039652
## sndvi       -0.1475019  0.32501710
## I(sndvi^2)  -0.4056626 -0.05483861
# AUC
predpr <- predict(glm.Addax2, type=c("response"))
(roccurve <- roc(Sub.Scale$obsAddax ~ predpr))
## 
## Call:
## roc.formula(formula = Sub.Scale$obsAddax ~ predpr)
## 
## Data: predpr in 859 controls (Sub.Scale$obsAddax 0) < 131 cases (Sub.Scale$obsAddax 1).
## Area under the curve: 0.6573
plot(roccurve)

# Cross-Validation
cv.binary(glm.Addax2)
## 
## Fold:  6 3 1 10 5 7 4 8 9 2
## Internal estimate of accuracy = 0.868
## Cross-validation estimate of accuracy = 0.868

Load Raster Layers

Load the NDVI data and SRTM data to make a model prodection. Remember that we will need to re-scale the raster layers, since our model results (our coefficients) are based on data that was also scaled. We will use an NDVI image that was available from November 2007. This was done because we also have an ancillary dataset (i.e., flight survey) that we used as an external dataset to assess model performance (not shown).

NDVI

# Load NDVI data from flight survey date
# Note that this file has 250-meter resolution (MOD13Q1 data product)
# The SRTM data has 30-m resolution
# We need these data sources to have the same resolution in order to make the prediction. Otherwise, we will get an error.
#setwd("D:/Jared/Work/R/SCBI/MODIS/Niger_Validation")

# Use the 2007 data from November for validation...matches the flight survey
ndvi <- raster(paste0(getwd(),"/Data/MOD13Q1_Nov2017.tif"))
# Convert raster to values to actual NDVI values
ndvi <- ndvi*0.0001

# Data needs to be summarized at 2.5km and scaled to match
# Create a focal grid....to match the resampling done at the survey points.....this just creates a matrix
# This is confusing, but it is a weighted grid...to summary values within the grid
FoGrid <- focalWeight(ndvi,d=2500,type='circle')
# Now Summarize the NDVI within the focalWeight grid
ndvi2 <- focal(x=ndvi,w=FoGrid,fun=sum,na.rm=TRUE) # Need to use sum....because the focalWeight grid...creates a matrix of values that add to 1.  We want a summary of values within the focal grid

# Plot the two results
par(mfrow=c(1,2))
plot(ndvi)
plot(ndvi2)

Surface Roughness

Now do the same procedure for the Surface Roughness layer. Rough is a 30-m resolution file, so will take a longer time to process.

# Create a different focalWeight grid because the cell resolutions are different (30 meters instead of 250 meters)
rough <- raster(paste0(getwd(),"/Data/Rough_Sub.tif"))

FoGrid1 <- focalWeight(rough,d=2500,type='circle')
rough2 <- focal(x=rough,w=FoGrid1,fun=sum,na.rm=TRUE)

# Plot to see the two raster layers
plot(rough)

plot(rough2)

Scale Rasters and Resample

Scale the rasters using the values summaries created above. Then, the NDVI raster file needs to be resampled to match the SRTM datafile. The extent of these files also needs to match.

# Scale values. To back transform, you need to:x - mean(x) / sd(x))
Scale.Val
##       Rough        NDVI    Dorcas     Human    Addax
## Mn 6.618164 0.092855046  2.922222 0.7393939 1.476768
## Sd 1.464744 0.009629818 10.710220 1.8006750 8.336148
rgh.scale <- (rough2-Scale.Val[1,1])/Scale.Val[2,1]
ndvi.scale <- (ndvi2-Scale.Val[1,2])/Scale.Val[2,2]

# Resample the grids so that they can be added together in the model.
# Resampling the 30m to 250m will result in a faster calculation.
# Resampling the 250m to 30m will keep the values
ndvi.rsmp <- resample(ndvi.scale,rgh.scale,method="bilinear")

# Compare the resolutions 
compareRaster(rgh.scale,ndvi.rsmp)
## [1] TRUE

Make Final Prediction

Now that the rasters have been scaled, we can use the coefficients to make a final prediction.

# Summarize the model
summary(glm.Addax2)
## 
## Call:
## glm(formula = obsAddax ~ srough + I(srough^2) + sndvi + I(sndvi^2), 
##     family = binomial(link = "logit"), data = Sub.Scale)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7518  -0.6128  -0.4999  -0.3055   2.9167  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.31025    0.13616  -9.623  < 2e-16 ***
## srough       0.63602    0.15559   4.088 4.35e-05 ***
## I(srough^2) -0.55024    0.12642  -4.352 1.35e-05 ***
## sndvi        0.08772    0.12001   0.731   0.4648    
## I(sndvi^2)  -0.21425    0.08969  -2.389   0.0169 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 773.74  on 989  degrees of freedom
## Residual deviance: 734.51  on 985  degrees of freedom
## AIC: 744.51
## 
## Number of Fisher Scoring iterations: 6
coef <- summary(glm.Addax2)
coef <- coef$coefficients

Addax.predict <- (exp(coef[1] + rgh.scale*coef[2] + rgh.scale^2*coef[3] + ndvi.rsmp*coef[4] + ndvi.rsmp^2*coef[5])/
(1 + exp(coef[1] + rgh.scale*coef[2] + rgh.scale^2*coef[3] + ndvi.rsmp*coef[4] + ndvi.rsmp^2*coef[5])))

par(mfrow=c(1,1))
plot(Addax.predict)

# Write the raster to a directory
#writeRaster(Addax.Thresh, 'EditDirectory.tif', format="GTiff", overwrite=TRUE)